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We describe a method to determine the eigenvalue density of empirical 
""O ■ covariance matrix in the presence of correlations between samples. This 

£h ' is a straightforward generalization of the method developed earlier by the 

authors for uncorrelated samples . The method allows for exact determi- 
nation of the experimental spectrum for a given covariance matrix and given 
correlations between samples in the limit N — > oo and N/T = r = const 
with N being the number of degrees of freedom and T being the number 
of samples. We discuss the effect of correlations on several examples. 

m . 

t^J- ; PACS numbers: 02.50.-r, 02.60.-x, 89.90.+n 

OO 

. Spectral properties of empirical covariance matrices have been inten- 

sively studied for uncorrelated samples [21 El E] • The eigenvalue distribution 
of the empirical covariance matrix is related to the eigenvalue distribution of 
the true covariance matrix by Marcenko-Pastur equation |2J. This relation 
plays an important role in many practical problems ranging from physics, 
. telecommunication [3] and information theory :6] to biology and quantita- 

£3 | tive finance [7]. Recently the corresponding equations has also been derived 

for correlated samples jS]. 

We will discuss some practical applications of these equations. More 
precisely we will show how to use them to explicitly compute the eigenvalue 
density of the empirical covariance matrix when the true covariance matrix 
is given. This method is of importance for instance for the problem of 
optimal portfolio selection if one considers strategies based on the analysis 
of very short time price changes which are known to be correlated in time, 
or for problems in telecommunication where one discusses propagation of 
signal through a random medium from a certain number of senders to a 
certain number of receivers (see e.g. jH] and references therein) and for 
many other problems. 
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To set the stage let us briefly recall the mathematical formulation of the 
problem. Let X = (Xi a ) be a real rectangular matrix of dimension N x T 
representing sampled values of a statistical system with iV-degrees of free- 
dom obtained in T measurements, a-th column of the matrix X contains N 
numbers obtained in the a-th measurement. The standard estimator of the 
covariance matrix C, describing correlations the degrees of freedom, is given 
by the matrix 1 c = (l/T)XX r : Cij = (1/T) ^2 a Xi a Xj a which is called em- 
pirical covariance matrix. X r stands for the transpose of X. The Marcenko 
and Pastur solution [2] refers to the situation when the measurements are 
uncorrelated: 

(Xi a Xjp) = 5 a pCij. (1) 

It relates the eigenvalue density of c to that of C. In the presence of cor- 
relations the Kronecker delta is replaced by an arbitrary symmetric semi- 
positive matrix A describing correlations between measurements: 

{XiaXjp) = A a pCij. (2) 

We shall discuss this case in the present paper. Our aim is to recall the 
formal solution [H] and to show how to use it to calculate spectral density 
p c of the empirical covariance matrix c, for given matrices A, C. 

The equations for the eigenvalue density p c can be derived by assuming 
that statistical fluctuations in the studied system are Gaussian. Then the 
matrix X can be treated as a random matrix chosen from the Wishart 
ensemble with the following probability measure EH : 



P(X) DX. = Afexp 



N,T 

-Tr X r C" 1 XA- 1 
2 



n dx ia , (3) 



i,a=l 



NT T N 

where the normalization constant is J\f = (2-7r)~~ (det C) - 2"(det A)~"2". 
One can easily check that correlations between matrices chosen randomly 
from this ensemble are given by (|2"|). Applying a diagrammatic technique 
[TT1 IT21 ITS] one can calculate averages of various quantities, in particular 
the resolvent of the empirical covariance matrix: 

The brackets denote the average over the ensemble of random matrices X 
with the probability measure (J3J). It is convenient to express g c in terms of 



1 We shall assume throughout the paper that (Xi a ) = 0. 
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generating functions for spectral moments of C, A and c: 

«°w - t % «* w - E -w - E ^ . P) 

fc=i fc=l fe=l 

where Met = f pc(x)x k dx are spectral moments of the eigenvalue density 
of the matrix C and similarly Ma and m c of A and c. The generating 
functions are related to the corresponding resolvents, in particular m c (z) = 
zg c (z) - 1. 

In the limit of N — > oo and fixed "rectangularity" coefficient r = N/T 
the generating functions Mq and m c are related by two equations [S]: 

m c (z) = M C (Z), 
z = ZrM c (Z)M A l (rM c (Z)), (6) 

where M^ 1 is the inverse function of the generating function Ma- From 
these equations one can formally calculate m c {z) when A and C are given. 
The variable Z is auxiliary. Having determined m c (z) from the equations 
(JHJ) one can calculate the resolvent g c (z) = (m c (z) + l)/z and the eigenvalue 
density p c {z) of the empirical covariance matrix c: 

p e (x = Im<7 c (x + iO+ = Im ^— >- . 7 

Before we show how to calculate m c (z) in the most general case we will first 
recall the case without correlations that is for A a p equal to the Kronecker 
delta. The equations © simplify to ^1 : 

m c (z)=M c (Z), 
z = Z(l + rM c (Z)). (8) 

This set of equations is equivalent to the Marcenko-Pastur equation [2]. It 
can be directly used to determine the eigenvalue density p c (x) p. We will 
sketch the method below. Let Ai < • • • < be ordered eigenvalues of C 
and n/% their multiplicities. The generating function Mq(Z) takes the form: 

Mc(£)=E!zX, (9) 

k=l k 

where pk = n^/N. Using (JHJ) and © we can determine m c (z) and hence from 
Eq. (J2J) the eigenvalue density function p c (x). The shape of the eigenvalue 
density p c {x) is encoded in the behavior of the conformal map (jHJ) near the 
critical horizon [Q, defined as a curve on the Z-plane which is mapped by 
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(jSJ into an interval on the real axis in the z-plane. Here X-,x+ 

denote the upper and the lower edge of the support of the function p c (x). 
The critical horizon may be determined as follows: let Z = X + iY be a 
point on the horizon. Because the eigenvalues are real the imaginary part 
of z = z{Z) must vanish. This gives the equation: 



Solving this equation for Y = Y{X) for given X we find the desired curve 
giving the critical horizon: X+iY(X). The equation 101) has two symmetric 
roots ±y because of the ambiguity of the map z = z(Z). They can be 
found numerically by an iterative method taking as an initial value Yq a 
small positive number. The positive root is mapped into z = x + i0 + . The 
corresponding values of x = z(Z) and p c {x) are given by: 



Treating Z as a dummy parameter we obtain a pair (x,p c ) which is equiv- 
alent to the eigenvalue density p c (x). Briefly speaking, the variable Z = 
X + iY is used to parametrize both the eigenvalue x and the spectrum p c . 

The variable X is an independent variable in this construction. It is 
restricted to a finite range X £ [X—,X+\. The limits of this range are 
mapped into the lower, x_, and the upper edge, x + , of the spectrum p c (x). 
They can be determined numerically by observing that they come from the 
points where the critical horizon intersects the real axis. Setting Y = in 
Eq. UlUj) we obtain an equation for X whose largest root corresponds to X + 
and the smallest to X-. The roots can be found numerically. 

The advantage of this method in comparison with the others 0:4 is that 
one has to solve only one algebraic equation for one point in the spectrum, 
which can be done either analytically (for K < 4 or some other special 
cases) or numerically. A more detailed discussion can be found in PQ. 

Encouraged by the success of this approach we want to adopt it to the 
general case of correlated samples (J2J). We have to return to the equations 
(JHJ). For given matrix C we can calculate Mq(Z) © and similarly for A: 




(10) 




(11) 



and 




(12) 




(13) 
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We have indexed the parameters of the spectrum of the matrix A by Greek 
indices to distinguish them from the parameters of the spectrum of C. As 
before, the first step of the construction is to parametrize critical horizon 
Z by one real variable. However, the situation is now more complicated 
because we have to invert Ma which may give a multi-valued function. 
Let us introduce the notation: Z = X + iY and M^ 1 = U + iV. Now, 
x(Z) = z(Z) takes the form: 

v PkXk [U(X 2 + Y 2 ) + X k (YV - XU)] 
X T ^ {X-X k Y + Y^ 

. \- PkX k [V(X 2 + Y 2 ) - X k {XV + YU)] 

{X-X k ) 2 + Y* ' li4j 

and the condition x £ R implies that 

FAX,Y,U,V)s^2 »** [V[X \t% ^ + YU)K °- ^ 

In order to invert Ma and to calculate U, V we use the relation rMc (X + 
iY) = Ma_(U + iV) which gives: 

( % p k X k (X -X k -iY) p a K{U -A a - iV) 



E p k A k \y\ - Afc — il J _ ST^ 
( Y - \, ^2 i V 2 ~ 



(X-A fc )2 + y2 ^ ([/ _A Q )2 + y2 • 

Comparing the real and imaginary parts we get: 

V (TT VTTV\- ST PkX k (X-X k ) \- p a A a (U-A a ) 

F 2 (X, Y,U,V) = r^ {x _ — - ^ — + y2 =0, (16) 

F 3i X, Y, U, V) = rY^2 (JC _ffi + Y , " ^ £ {U X? + V 2 = ° 

For fixed X, the set of three equations (|15j ) -(|17j ) has to be solved numerically 
for three unknown variables Y,U,V. Then we can use formulae (j!2j) and 
(|14j) to determine the spectrum p c (x), as it was done in the previous case 
for A = 1. 

These new equations are much more complicated than before and have 
no such a beautiful graphical interpretation as Eq. (|10|) . One can imme- 
diately realize that it is easy to eliminate the variable U from Eq. (|15[) to 
obtain a set of two equations for two variables Y, V. This makes however 
the equations less transparent and does not simplify the calculations. 
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There are no universal root finding procedures for a nonlinear set of 
equations. Moreover all of them are usually very sensitive to the choice 
of the initial condition and therefore it is difficult to guarantee that the 
procedure will converge to the wanted solution starting from some initial 
condition unless a special care is payed. In other words we have to make 
some effort to fully automatize root finding on a desired Riemann sheet. 
The idea is to use the continuity of the solution as we shall explain below. 
Rewrite Eqs. (|15 |) -([17 j) in a compact form: 

F(X,r) = 0, (18) 

where f = (Y, U, V) = (ri,r2,r^) and F = (Fi, F2, F3). Instead of a single 
equation (UOH we have now three equations which must be solved for a 3- 
vector f. The solution r = r{X) is a continuous function of X. For given X 
different solutions r{X) of Eq. (|18|) which lie on different Riemann sheets 
assume different values as long as they are outside the real axis. So they can 
be distinguished by value. Suppose that we know the value f = r(Xo) for 
some Xq and that (Y(Xq), U(Xq), V(Xq)) is the desired solution. In order 
to calculate r for X = Xq + dX we can use the first term of the Taylor 
series: 

~dr(X)~ 



f(X + dX) r(X ) + 



dX 



■ dX (19) 
x=x 



as an initial point for the root finding procedure. The partial derivatives of 
r with respect to X can be analytically found by differentiating l|18|l which 
gives: 

where Q is the inverse of the matrix (dFi/drj)ij, whose elements can be 
explicitly calculated from Eqs. p5 |) -(|17 |) . Because of the continuity of the 
solution this procedure must be convergent to the solution on the same 
Riemann sheet as for Xq if dX is small enough. Then we can keep on 
repeating the whole procedure moving in small steps X — ► X + dX along 
the same solution. The procedure is stopped when Y < because this 
means that we have reached the point p c (x) = 0. Thus if we can guarantee 
that t{Xq) is on the correct Riemann sheet, all other r(X) obtained in this 
procedure will be on the same sheet. The check, whether f{Xo) lies on the 
desired Riemann sheet, does not have to be very efficient since it is done 
once per run or if the spectrum consists of disconnected parts, it should be 
done as many times as is the number of parts. 

Let us remark that the parameter X, being just the real part of Z, which 
we used here to parametrize the critical horizon is not always well suited 
to this purpose. In general case one has to use a parameter which uniquely 
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parametrizes the solution. In all the cases discussed below except one, X 
does the job. 

In some cases, where the map between z and Z is known explicitly one 
can simplify the method. Exponential correlations of samples, which are 
physically important, belong to this class: 

A^ = expf-i^iY (21) 
In this case an exact formula of the map z = z (Z) is known 2 [S]: 



z = Z ^coth(l/r) • rM c {Z) + ^1 + s . nh(1/r)2 M 2 (Z) j , (22) 

where r is the range of correlations. In the limit r — ► 0, the quantities 
sinh(l/r) — ► oo, coth(l/r) — ► 1 and in consequence Eq. (|2*2*|) simplifies 
to (JSJ) as it should. Because of the presence of the square root in (|22jl 
it is hard to divide the above formula into the real and imaginary part. 
This is however not a problem for a numerical root finder. Changing X in 
small steps between the limiting values X_ and X + we solve the equation 
Im (z(X + iY)) = for Y = Y(X) and calculate x = z(X + iY) and p c (x) 
by means of the formula ()12)) . The limiting values X± which correspond 
to the largest and smallest real value on the horizon can be determined 
by solving the equation Im (z(X + iY)) = for X on real axis that is for 

The influence of the correlation time r on the map z = z(Z) is illustrated 
in Fig. ^ Here C has two eigenvalues {Aj} = {1,2} with equal weights 
pi = P2- One sees that for small r and for r being not very large the only 
effect of increasing r is similar to increasing r. However, for larger r and 
growing correlation time the map is deformed. This can be easily explained, 
because for small rj sinh(l/r) <C 1 the square root in the formula 1)221) is 
approximately equal 1. This means that formula 1)22)1 reduces to the case 
(jHJ) without correlations but with new r' = r ■ coth(l/r). When the ratio 
r/sinh(l/r) becomes larger the corrections begin to play an important role 
in the map ()22j) . 

This observation indicates that the presence of exponential correlations 
between samples may be confused with pure correlations (C / 1,A = 1) 
but with modified "rectangularity" coefficient r (see Fig. 

Consider now a practical example. In the previous paper we discussed 
the evolution of the eigenvalue density of the experimental covariance matrix 



Actually in 8 rather a relation Z — Z(z) is given but it can be easily inverted for 
z = z(Z). 
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Fig. 1. Critical horizon for exponential correlations and {A^} = {1, 2}. The horizon 
is symmetric about the X axis; only upper part is shown. Left: r = 0.1 and 
r = 0,1,2,5 from the inner to the outer horizon. Right: for r = 0.5 and r = 
0, 1, 2, 5. Deviations from the shape expected in case of A = 1 grow while the ratio 
r/sinh(l/r) increases. 

1.5, . 1 . 1 . 1 . 1 1 .5 1 . 1 . 1 . 1 . 1 1.5, . 1 . 1 . 1 . 1 
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Fig. 2. Eigenvalues densities p c (x) for pure Wishart C = A = 1 with fixed r = 0.1 
(solid line), C = l,r = 0.1 and exponential correlations l|21|) (dotted line) and for 
pure Wishart with modified r = 0.1 • coth(l/r) (dashed line) for different r: r = 1 
(left picture), r = 2 (middle picture), r = 5 (right picture). The distributions 
represented by dashed and dotted line behave similarly. In practice, when one 
reconstructs them only from one set of eigenvalues, they can be easily mixed up. 

c with the number of independent measurements for N = 18 eigenvalues, 
obtained from data for daily returns of 18 stocks on the Polish Stock Mar- 
ket. Here we will discuss how the spectrum changes in case of correlated 
returns as for example happens in short time horizons. We will use the 
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Fig. 3. The plot of p c (x) for N = 18 eigenvalues of C taken from Polish Stock 
Market. Solid: r = 18/255, t = 0, dotted: r = 18/255, t = 10, dashed: r = 
180/255, t = 0. Inset: the "bulk" of spectra for r = 18/255 and r = 0, 10 calculated 
(solid line) and found experimentally (dotted line) for sample of 3 x 10 5 Wishart 
matrices of size N = 18 generated by a Monte-Carlo procedure. 

same eigenvalues as before [Q. In the Fig. |21 p c is shown for the three 
cases: for r = 18/255 without correlations (r = 0), for r = 18/255 and 
r = 10, and finally for r = 10 • 18/255 and r = 0. We see that the first case 
deviates very much from the two remaining ones. Comparing the second 
and the third case we see that they follow the same trends. Technically in 
the second example one has ten times more data per degree of freedom but 
because the correlation time is equal to ten this means that roughly only 
every tenth data point introduces a new information. Therefore intuitively 
the second and the third case have a similar statistical content. Indeed a 
quick look at Eq. (122|) shows that if one approximates the contribution from 
the square root by unity neglecting the term proportional to r 2 then this 
equation will assume the same form as Eq. (jSJ) with an effectively rescaled 
parameter r: r — ► r • coth(l/10) ~ lOr. This approximation is legitimate 
only if rj sinh(l/r) <C 1 as in the discussed example. 

A similar observation can be made in a general case of arbitrary corre- 
lations A. Consider the map (jHJ) in case of small rCl. Assuming that the 
radius of the horizon is of order yfr as in case without correlations, from (jUJ) 
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it follows that \rMc(Z)\ ~ \fr on the horizon. We can invert M A (Z') for 
small Z'\ 



where the coefficients of the series are expressed by the moments 8 : 

_ Ma2 _ M A3 M A1 - (M A2 ) 2 

W (M A1 )2 ' 1X2 (M Al f ' ■ 

Then the Eq. Q takes the form: 



The multiplicative factor M A \ merely redefines z and does not affect the 
map z = z(Z). An important difference in comparison with Eq. (JHJ) is 
the change of the coefficient at the linear term in r which tells us that for 
sufficiently small r the map (|25j) can be viewed as © but with of a modified 
parameter r: r — > r ■ [i\ = rM A 2/(M A i) 2 ■ 

To illustrate this, let us consider an example. Take C = 1 and A 
having two eigenvalues one of which being Ai = 1 and the other A 2 being 
a free parameter. Additionally assume that the two eigenvalues have the 
same multiplicities and hence p\ = P2 = 1/2. The model looks somewhat 
artificial but it well illustrates a feature which is quite general. 

In the upper part of Fig. 0]we compare positions of the left border A_ 
of the critical horizon © calculated in two different ways: numerically for 
different r, and analytically for the Wishart model without correlations with 
an effective "rect angularity" parameter r' = fii-r, which gives A_ = 1-vr 
where m = 2(1 + Af,)/(1 + A 2 ) 2 . The analo gous comparison is made for 
the position of left border x_ of the eigenvalue density function p c (x) in 
the bottom of Fig. |1J Here we rescale additionally axes: x — > x ■ M A \,y — > 
y/MAi with the factor Mai = (1 + A 2 )/2 as in Eq. (|25|) . 

The corresponding eigenvalue distributions of the empirical covariance 
matrix c for these two cases for r = 0.1 are shown in Fig. [3 One sees an 
excellent agreement between them. This means that indeed for sufficiently 
small r the only influence of A on the spectrum of c is an effective change 
of r and rescaling the axes. 

Let us shortly summarize. Using the relation between the moments gen- 
erating functions © (or equivalently between the resolvents) we have shown 
how to effectively compute the eigenvalue spectrum of the empirically deter- 
mined covariance matrix for given correlations, even in the case of correlated 
measurements. As an example of the application of the method we have 
demonstrated the influence of the exponential correlations between mea- 
surements on the eigenvalue spectrum of the covariance matrix calculated 



M A 1 (Z') = M A1 Z'^ (1 + mZ' + ii 2 Z' 2 + ...) 



(23) 




(25) 
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Fig. 4. Top: the position of left border X- of the map z = z(Z) for C = 1 and 
{A Q } = {1,A 2 } (solid line) and pure Wishart C = A = 1 with r' = rpi(A 2 ) 
(dashed line), from the left: r = 0.01, 0.1, 0.5. Bottom: the same for the left border 
x_ of spectrum p c (x) with additionally rescaled x axis: x — > x-Mxi for the Wishart 
case. The deviations from the rescaled Wishart spectrum become significant for 
r > 0.1. 

for stocks' logarithmic returns. We have argued that in the limit of the large 
number of samples that is for T » TV, or equivalently for r = N/T <C 1, the 
eigenvalue density of the empirical covariance matrix can be approximated 
by the eigenvalue density for a reduced number of uncorrelated samples, 
with the reduction factor being approximately inversely proportional to the 
correlation time for the original correlated samples. 
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Fig. 5. The spectra for r — 0.1, C = 1 and A having two eigenvalues {1, A2} (solid 
line), and for rescaled Wishart (dashed line): r — > rfii,x — ► x ■ Mai, V — > u/Mai- 
From the left to the right: A 2 = 1, 2,4. 
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